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SOME IMPROVEMENTS ON LOCALLY REPAIRABLE CODES 


JUN ZHANG, XIN WANG, AND GENNIAN GE 


Abstract. The locally repairable codes (LRCs) were introduced to correct erasures effi¬ 
ciently in distributed storage systems. LRCs are extensively studied recently. In this paper, 
we first deal with the open case remained in [Uj] and derive an improved upper bound for the 
minimum distances of LRCs. We also give an explicit construction for LRCs attaining this 
bound. Secondly, we consider the constructions of LRCs with any locality and availability 
which have high code rate and minimum distance as large as possible. We give a graphical 
model for LRCs. By using the deep results from graph theory, we construct a family of LRCs 
with any locality r and availability 2 with code rate and optimal minimum distance 
O(logu) where n is the length of the code. 

1. Introduction 

In distributed storage systems, redundancy should be introduced to protect data against 
device failures. The simplest and most widespread technique used for data recovery is repli¬ 
cation. However, this strategy entails large storage overhead and is nonadaptive for modern 
systems supporting the “Big Data” environment. To improve the storage efficiency, erasure 
codes are employed, such as Windows Azure [16], Facebook’s Hadoop cluster [32] . where the 
original data are divided into k equal-sized fragments and then encoded into n fragments 
(n > k) stored in n different nodes. It can tolerate up to d — 1 node failures, where d is the 
minimum distance of the erasure code. Particularly, the maximum distance separable (MDS) 
code is a kind of erasure code that attains the optimal minimum distance with respect to 
the Singleton bound and thus provides the highest level of fault tolerance for given storage 
overhead. However the MDS code is inefficient when we consider the disk I/O complexity, 
repair-bandwidth and so on. 

To improve this, Gopalan et al. p3j, Oggier and Datta [25], and Papailiopoulos et al. [28] 
introduced the concept of repair locality for erasure codes. The zth coordinate of a code 
has repair locality r if it can be recovered by accessing at most r other coordinates. In this 
paper, an LRC is referred to an [n, k] linear code with all symbol locality r. When r -C k, 
it greatly reduces the disk I/O complexity for repair. 

Considering the fault tolerance level, the minimum distance is also a key metric for LRCs. 
Gopalan et al. [T3] first derived the following upper bound for codes with information locality: 

(1.1) d < n — k + 1. — (f—1 — 1) 

r 
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which is a tight bound by the construction of pyramid codes [15]. Although the bound ( 11 . ip 
certainly holds for all LRCs, it is not tight in many cases. Later, in [9jj2?], the bound ( 11 . ip 
was generalized to vector codes and nonlinear codes. In order to consider multiple erasures 
in local repair, two different models were put forward independently by Prakash et al. [25] 
and Wang et al. [42]. 

For simplicity, the LRC that achieves the upper bound (II. Hi with equality is called an 
optimal (maximum) LRC in this paper. The first optimal LRCs for the case (r + l)|n were 
constructed explicitly in [39] and [33] by using Reed-Solomon codes and Gabidulin codes 
respectively. Both constructions were built over a finite field whose size is exponential in 
the code length n. In [37] for the same case (r + 1) | n the authors constructed an optimal 
code over a finite field of size comparable to n by using specially designed polynomials. 
This construction can be extended to the case (r + 1) \ n with the minimum distance 
d > n — k — ["^1 + 1 which is at most one less than the upper bound (11.11) . In [Q0SII15], the 
authors generalized this idea to the cyclic codes and algebraic geometry codes. 

Recently, Song et al. [35] carefully studied the tightness of the bound (II.ip . and left two 
open cases. Another recent improvement was due to [30] where Prakash et al. showed a new 
upper bound on the minimum distance for LRCs. This bound relies on a sequence of recur¬ 
sively defined parameters and is tighter than the bound (II.ip . But no general constructions 
attaining this new bound was presented. A great improvement for this problem is made by 
Wang and Zhang in |40j. The authors carried out an in-depth study of the two problems: 
what is the largest possible minimum distance for an [n, k] LRC? How to construct an [n, k] 
LRC with the largest possible minimum distance? For the first problem, they derived an in¬ 
teger programming based upper bound on the minimum distance for LRCs, and then gave an 
explicit bound by solving the integer programming problem. The explicit bound applies all 
LRCs satisfying n\ > n 2 , where n 1 = and 77,2 = ni(r + 1) — n. For the second problem, 

they presented a construction of linear LRCs that attains the explicit bound for n\ > n 2 . 
Therefore, they had completely solved the two problems under the condition n\ > n- 2 - A 
similar result can be found in [44] using matroid theory. 

In this paper, we first deal with the open case remained in [40] and derive an improved 
upper bound for the minimum distances of LRCs. We also give an explicit construction for 
LRCs attaining this bound. 

There are lots of other works devoted to the locality in the handling of multiple node 
failures, such as [3TH361I37] , considering LRCs which permit parallel access of “hot data”, the 
works of [26JHT] studying LRCs with general local repair groups, and the work [30] which 
proposed sequential local repair. Very recently, Wang et al. [42] proposed a binary LRC 
construction achieving any locality and availability with very high code rate. An LRC code 
C [n, k , d ] is said to have locality r and availability t, if for any codeword y € C, any symbol 
yi of y can be computed from some other r symbols of y, and furthermore there are t disjoint 
ways to reconstruct yi. Unfortunately, the minimum distance of the codes constructed in [42] 
is too small, saying t + 1. 

The second part of this paper deals with constructions of binary LRCs C [n, k, d] with any 
locality r and availability 2 which have both high code rate and large minimum distance. 
We first give a graphical model for binary LRCs. We then use graphs with long girth to give 
a high rate code construction. Comparing with the constructions [3I)]]42j, our codes have a 
slight decline of rate, however, our codes have much larger minimum distance (d = O(logn)). 
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This paper is organized as follows. Section [2] reviews some elementary results that will be 
used in this paper. Section 3 solves the integer programming problem put forward in [40], 
and gives an explicit upper bound for LRCs satisfying ri\ < ri 2 - Then Section 4 presents an 
explicit construction attaining this bound. Section 5 gives a construction of a family of LRCs 
with any locality r and availability 2 having code rate and minimum distance O(logn) 
where n is the length of the code. Finally, Section 6 concludes the paper. 


2. Preliminaries 


In [ 40] . the authors derived an integer programming based bound on the minimum distance 
of any LRC. Define 


i -i 

(2.1) T(x) = max min (xr + 1 — N (a^. — t h .)), VI < x < n\, 

“l.“s i=l 

where s, ti ,..., t s , Oi,..., a s satisfy 

{ t\ + + t s = n\\ 

cli + ■ ■■ + a s = n 2 ; 

Oj > ti — 1 , V 1 < i < s; 

s > 1 ; ti >1 , V 1 < z < s. 

and l,h\,... ,hi satisfy 

( 2 . 2 ) th x + • • • + thi_ 1 < x < tfa + ... + th r 


Theorem 2.1 ( [40]). For any [n,k,d\ LRC, 


d <n — k + 1 — r), 

where rj = max{x : T(x) — x < k}. 


Next, we review the construction of Tamo and Barg [37] as their construction gives some 
optimal codes for the bound we will obtain later. Furthermore, we will employ their con¬ 
struction to get more optimal codes meeting our bound. 

Let A C F, and let A be a partition of A into m subsets Ai. Consider the set of polynomials 
Fa[ x \ °f degree less than |v4| that are constant on the blocks of the partition: 

Fa[ x] = {/ G F[x\ : f is constant on A i: i — 1, ..., m; deg / < |^4|}. 

The annihilator of A is the smallest-degree monic polynomial Ha such that Ha{q) = 0 if 
a G A, i.e., Ha(x) = naen( x — a )- Observe that the set Fa[x\ with the usual addition 
and multiplication modulo h(x) becomes a commutative algebra with identity. Since the 
polynomials Fa[x] are constant on the elements of A, we write f(Ai) to refer to the value of 
the polynomial / on the set A t G A. 


Proposition 2.2 ( [37]). Let cti, ■ • • , a m be distinct nonzero elements of F, and let g be the 
polynomial of degree deg(g) < |A| that satisfies g(Ai) = for all i = 1, • • • , m, i.e., 

171 7 

^)=x>e n 

i —1 a^Ai b£A\a 


Then the polynomials 1, <?,•••, g m 1 form a basis of Fa[x] . 
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Proposition 2.3 ( [37]). There exist m integers 0 = do < d\ < ■ ■ ■ < d m - 1 < |A| such that 
the degree of each polynomial in F A [x\ is di for some i. 

Corollary 2.4 ( [37]). Assume that d\ — r + 1, namely there exists a polynomial g in F A [x] 
of degree r + 1, then di = i(r +1) for all i = 0, • • • , m— 1, and the polynomials 1, g, ■ ■ ■ , g m ~ l 
defined in Proposition \2.A form a basis for F A [x\. 

Construction 2.5 ( [37]). 1. Let F be a finite field, and let A C F be a subset such that 
|A| = n, n mod (r + 1) = s ^ 0,1. Assume also that k + 1 is divisible by r (this assumption 
is nonessential). 

2. Let A be a partition of A into m subsets Ai, ■ ■ ■ , A m such that \Ai\ = r+1,1 < i < m—1 
and 1 < \A m \ — s < r + 1. Let g(x) be a polynomial of degree r + 1, such that its powers 
l,g, ■■■ ,g m ~ l span the algebra Fj[x\. W.L.O.G., assume that g vanishes on the set A m , 
otherwise one can take the powers of the polynomial g(x) —g(A m ) as the basis for the algebra. 

3. Let a = (a 0 , • • • , a r _i) G F k be the input information vector, such that each a ; for 
i s — 1 is a vector of length an ^ a , s _ 1 { s 0 j length AA _ \ Define the encoding 
polynomial 

s-2 —p——1 r—l 

fa{x) = ^2 a i,j9{x) 3 x 1 + a s-i, jg{x) J X s - 1 + Y a i,j9 3 ( x )x l ~ s h Am (x). 

2—0 j =0 j = 1 i=s j =0 

The code is defined as the set of evaluations of f a (x), a G F k . 

Theorem 2.6 ( [37]). The code given by Construction 12 .51 is an [n,k,r] LRC code with 
minimum distance satisfying 

k 

d > n — k — \—] + 1. 

r 


3. Upper Bounds of the Minimum Distance 


In this section, we solve the integer programming problem (12. If) . and derive an explicit up¬ 
per bound for all LRCs satisfying n\ < n^. Then we make comparisons with the bound (11.11) 
to show the improvements of our explicit bound. Actually, in the next section we will show 
our bound is tight for the case n\ < n 2 . 


Theorem 3.1. For 1 < x < and ?+ < n 2 , 


v h(a;) = xr + 1. 


Proof. 1. Set 



= 1 , 

= ni, 

= n 2 . 


Then we have 


i -i 


> , min (xr + 1 — VW - t h .)) 

I,hi ,••• ,hi ' * 

2 — 1 

2. Assume that for some 1 < x < ni, 

T(.t) > xr + 2. 
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xr + 1. 









Then there exist integers s and U,ai, 1 < i < s, satisfying the constraints of the integer 
programming and 

i- 1 

min (xr + i - - 4.)) > xr + 2. 

’ 1 ’'"’ 1 i=l 

Therefore for all integers l and hi, ■ ■ ■ , hi G [s] satisfying the constraint (12.211 . we have 

i- 1 

(3.1) <-1. 

i= 1 

If there is some i such that t t > x, let hi = i in the constraint (12.211 . then 

/-i 

^ 'X °hi ~ thi) = 0 , 

Z— 1 

which contradicts to (13.111 . And hence, the assumption T(x) > xr + 2 does not hold, and we 
finish the proof. 

So we assume < x, VI < i < s. For 1 < i < s, define 

bi — di ti . 

W.L.O.G, we assume that &i < b 2 < ■ • • < b s . Since t, < x, we can find i 0 = 1, A ,i 2 , ■ ■ ■ ,i p < 
s satisfying that 

ti + ■ ■ ■ + ijj-i <x < ti + ■ • ■ + fji, 

+ ' ' ' + U 2 -1 < x — tii A ' ' ' A ti 2 , 


Then we have 


U p -! +-h Up-i <x < t ip +- \-U p , 

U p ~\ -f t s <x. 


h —1 


i=l 


^2-1 


- 1 , 

- 1 , 


i=ii 

• * * i 


ip 1 

E^< -i- 


i—ip—i 


Noting Y2i=ibi = n 2 — ni >0, we get Yli=i p bi > 0- Because x < ri\ , then p > 1, we 
can consider the last two parts of t^s in the reverse order t s , ■ ■ ■ ,ti p , ■ ■ ■ ,U From the 
definition of i p , we know t m > x and b m > 0. So there exists q satisfying 

Y!m=s t m<x < Em= s and ^™=. s < -1- On the other hand b ip _ x < b ip < ■ ■ ■ < b 8 , we 

get b lp _ l , ■ ■ ■ , b iq < 0, which contradicts to Y^r‘n=s bm > 0. 

Thus we have T(.t) < xr + 1. □ 
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Theorem 3.2. For any [n,k,d\ LRC with n± < n 2 , where ri\ = andn 2 = rii(r + l)—n, 

it holds that 

( 3 . 2 ) d < n — k + 1 — ([— - ] — 1 ). 

r — 1 

Proof. This follows from Theorems 12.11 and 13.11 □ 


Since the bound (13. 2 p in Theorem 13.21 holds for ri\ < n 2 , all the comparisons we make 
below are under the condition n± < n 2 . 

The bound (11.111 given by Gopalan et al. [d] is the first upper bound on the minimum 
distance of LRCs. Since r < k (the natural condition that LRCs require), we always have 
r^5^1 > rjl • So the bound (13. 2\) generally provides a tighter upper bound than the bound 

(tH)- 

Specially, we assume k = ur + v for some integers u, v and 0 < v < r — 1, then 


d<n — k + 1 — (f-—— 1) 
r — 1 


n — k — u + 1, u + v < r; 
u — k — u. u + v > r. 


4. Code Construction When tt-i < n 2 

In this section, we present an explicit construction of LRCs attaining the bound (13.211 in 
some cases. The idea of construction comes from [57]. 

Theorem 4.1. When n\ < n 2 and u + v>r,n 2 ^r, the bound \ 3. 21) is achievable. 

Proof. This follows from Theorems 12.61 and 13.11 □ 

Modifying the construction in the above theorem, we show that the bound (13. 2\) is also 
tight in other cases. 

Construction 4.2. Let F be a finite field, and let A C F be a subset such that |LL| = n. 

(1) Since n = n\{r + 1) — n 2 = n x r — (n 2 — nfi), let A be a partition of A into ni subsets 
Ai, ■ ■ ■ , A ni such that \Af\ = r, 1 < i < n\ — 1 and \A ni \ = s = r — {n 2 — nfi > 1. Let 
g(x ) be a polynomial of degree r, such that its powers 1, g, ■ ■ ■ , g ni ~ l span the algebra 
F _4 [x] . W.L.O.G., we assume that g vanishes on the set A ni and u + v — s (this 
assumption is nonessential, in fact we only need u + v < s). 

(2) Let a = (ao, • • • ,a r - 2 ) G F k be the input information vector, such that ai is a vector 
of length il + 1 for 0 < i < s — 1 and a.i is a vector of length u for i > s. Define the 
encoding polynomial 

s —1 u r —1 u — 1 

fa(x) = a id9( x ) J x l ~ s h Ani (x ), 

2—0 j =0 i=s j =0 

where h Ani = X\ a&Ani {x - a). 

The code is defined as the set of evaluations of f a (x), a G F k . 

Theorem 4.3. Keep the notation as above. The code given in Construction 
LRC with locality r — 1 and minimum distance 

d>n — k — u + 1. 


4-2 is an [n , k] 
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Proof. Since the encoding is linear and the encoding polynomials have degree at most 

max{«r + s — 1, (u — l)r + r — 1} = ur + u + v — 1 — k + u — 1 

we have d > n-deg(f) > n — k — u + 1. The locality property is similar to Construction (|2.5jh 
If the erased symbol f a (x) lies in x G A ni , by interpolating the other s — 1 points in A ni 
we get a polynomial of degree at most s — 2 to recover f a (x). Otherwise, we use r — 1 
interpolation points to find a polynomial of degree at most r — 2 to recover f a (x). So, it is 
an LRC code with locality (r — 1). The result follows. □ 

As a corollary, we obtain more tight range for the bound (13.21b 

Corollary 4.1. When ri\ < n? and u + v + n^ — ni < r, the bound \3.2 j) is achievable. 

Proof. We can view the code constructed above as an LRC with locality r, then the corollary 
follows directly from Theorems 13.21 and 14.31 □ 


5. Graph-based Construction of LRCs with Arbitrary Locality and 

Availability 2 


Very recently, Wang et al. [i42j proposed a binary LRC construction achieving any locality 
and availability with very high code rate. In this section, we first give a graphical model 
for binary LRCs. Secondly, we consider the special case t — 2, i.e., there are two disjoint 
repair ways for any coordinate. We give a high rate code construction. Comparing with 
the construction [42], our codes have a slight decline of rate, however, our codes have much 
larger minimum distances. 

Recall that an LRC C [n, k, d] with locality r and availability t satisfies the following 
property: for any codeword y G C, any symbol y, of y can be computed from some other r 
symbols of y, and furthermore there are t disjoint ways to reconstruct y t . 


Proposition 5.1 ( [36]). For a linear code C[n,k,d\ with locality r and availability t, the 
rate of the code satisfies 


k 

n 


n 


i=l 1 + ~r 


The bound in the above proposition can not be achieved in most cases. Wang et al. m 
gave a construction from the incidence matrices of some combinatorial designs: 


Proposition 5.2 ( [42]). For any r and t, there are binary linear codes C[n,k,d\ with 
locality r and availability t satisfying 

k r 

— =- and d = t V 1. 

n r + t 

Note that for fixed t, the minimum distance of the construction in the above proposition 
is fixed. But as discussed at the beginning of this paper, the minimum distance of LRCs is 
a very important metric, especially for multiple erasures. So how to construct LRCs with 
high rate, large minimum distance, any locality and availability is the issue we care in the 
following content. 

Next, we only consider the binary case. The method works as well for non-binary cases. 
To construct a binary LRC C [n, k, d] with locality r and availability t, it is equivalent to 
construct a parity check matrix II such that each column has Hamming weight > t and each 
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row has Hamming weight < r + 1 such that the inner product of any two rows is 1. Note 
that rows of H might be linearly dependent. Corresponding to this parity check matrix 

H (hi j)l<i<m, l<j<ni 

there is a bipartite graph G whose bi-adjacent matrix is H. Explicitly, the graph G = (V, E) 
is defined as following: 

• The set V of vertices is separated into two parts {ci, c 2 , • • • , c m } and {xi, x 2 , • • • , x n } 
which represent rows and columns, respectively. The vertices {ci,c 2 ,--- , c m } are 
always called constraints, and the vertices {xi,x 2 , • • • ,x n } are called variables. 

• The set E of edges: there are no edges connecting vertices in the same part, all 
the edges are connecting vertices from the distinct parts. Precisely, there is an edge 
between c t and Xj if and only if h hJ = 1 for any 1 < i < rn. 1 < j < n. 

Example 5.1. For the matrix 

/100100100X 
0 1 0 0 1 0 0 1 0 

0 0 1 0 0 1 0 0 1 

H - 100010001 ’ 

0 1 0 0 0 1 1 0 0 

Vooiioooio/ 

the corresponding graph is Figure 1. The matrix Ft defines a [9,4] binary code with locality 
2 and availability 2. 



FIGURE 1. The bipartite graph G representing H 


In the literature, the bipartite graph is called Tanner graph. Recall that the degree of 
a vertex v G V is defined to be the number of edges connecting the vertex, and denoted 
by deg(u). In this setting, in order to construct a binary LRC C[n,k,d\ with locality r 
and availability t, the problem is reduced to constructing a bipartite graph G with vertices 
{ci, c 2 , • • • , c m } U {xi,x 2 , • • • , x n } such that deg(Q.) < r + 1 and deg(xj) > t for any 1 < 
i < m, 1 < j < n. Meanwhile, to simplify the discussion, we only consider the regular case. 
That is, the Tanner graph is a regular graph, where 

deg (ci) = deg(c 2 ) = • • • = deg(c m ) = r + 1 

and 

deg(xi) = deg(x 2 ) = • • • = deg(x n ) = t. 

In this case, we have the following lower bound for the corresponding code: 


Proposition 5.3. The rate p of the binary code is 

t 

p> 1-. 

r + 1 

Proof. By counting the number of l’s in the bi-adjacent matrix H of G, we have 


777 / t 

m(r + 1) = nt , or — = 


So the rate of the code is 


n r + 1 

Rank (H) > m t 

n ~ n r +1 ’ 


where Rank(iJ) is the F 2 -rank of H. 


□ 


ft is a very tough work to compute the exact value of Rank (H) in general. But it is 
an important issue in many application scenarios. For graphs with strong combinatorial 
property, computing Rank(//) attracts lots of interests [Hl5ll8l lT8ll2Tll3l] . 

There are advantages of graphical representation of codes [101 UU] , ft generalizes low- 
density parity-check codes, convolutional codes, trellis codes, classical linear system theory, 
behavior systems theory, etc. Fast algorithms on graphs give efficient encoding and decoding 
algorithms, such as the sum-product algorithm, BCJR algorithm, Viterbi algorithm, etc. In 
our specific case of LRCs, when the information of any node is not available or damaged, it 
is easy to recover the information by adding the information of neighboring variable vertices 
of any neighbor of the node in the graph. Even if many variable nodes are damaged, we 
can track in the graph for the intact information to recover the damaged nodes provided 
that the number of damaged nodes is less than the minimum distance of the code. This is 
our motivation to enlarge the minimum distance of the LRCs with the required locality and 
availability as large as possible. 

Next, we restrict ourselves to the case t = 2 where there are two disjoint repair options 
for each coordinate. In other word, the degree of Xj (1 < j < n) is two in the Tanner graph. 
In this case, the rate of the code is 

r — 1 

> - 

r + 1 

which might be smaller than that of [32] by difference (at most) 

r r — 1 2 

r + 2 r + 1 (r + l)(r + 2) 

By slight sacrifice of the code rate, we can construct codes with much larger minimum 
distance. More concretely, the code in [42] has minimum distance 3, but our codes have 
minimum distance 0(log n). 

Since deg(xj) = 2 for all 1 < j < n, the Tanner graph G can be reduced to a smaller 
graph G red : 

• The vertices are {ci, c 2 , ■ • • , c m j. 

• There is an edge between c t and Cj if and only if c* and Cj connect some Xi simulta¬ 
neously in the graph G. 

The reduced graph G red is an (r + l)-regular graph. One could also refer the reduced graph 
G red as another graphical model of the code C. The difference between the two graphs is 

9 










that Tanner graph considers constraints (or rows) as one part of the bipartite graph, but the 
reduced graph considers the constraints as the edges of the graph. 

Example 5.2. Continue Example \5.1[ The reduced graph G re d is Figure 2. 



FIGURE 2. The reduced graph representing H 


To analyze the minimum distance of the code, we need one more index of the Tanner 
graph or the reduced graph. The girth of a graph is the length of a shortest cycle in the 
graph. Since only graphs without multi-edges are involved in this paper, it is easy to see 
that the girth of a graph is 0 or > 3, and the girth of a bipartite is an even integer: 0 or > 4. 

Theorem 5.1. Let G re d be an (r + 1)-regular graph with m vertices and girth g. Extend 
the graph G re d to a bipartite graph G with regularity 2 and r + 1. The null space of the 
bi-adjacent matrix H of G defines our binary linear code C. Then the code C has length 
} dimension > minimum distance g, locality r, and availability 2. 

Proof. The only thing we need to prove is that minimum distance d = g. On one hand, 
W.L.O.G., let ci, C 2 , ■ ■ ■ , c g be a cycle of length g in G re d- Then it is extended to a cycle of 
length 2 g in G, saying ci, x±, C 2 , x 2 , ■ ■ ■ , c g , x g . By the fact deg(xj) = 2, the restriction of the 
parity check matrix H to the columns Xi, x 2 , ■ ■ ■ ,x g is 



Xi 

x 2 

x 3 


x g 

Cl 

1 

1 

0 


0 

C 2 

0 

1 

1 


0 

C 3 

0 

0 

1 


0 

c g 

1 

0 

0 


1 


0 

0 

0 


0 


which defines a codeword with support set { 1 , 2, • • • , g}. So the minimum distance d < g. 

On the other hand, let c be a codeword of Hamming weight d. W.L.O.G, assume the sup¬ 
port of c is {1, 2, • • • , d}. By non-zero location chasing, we prove d > g. We can also assume 
the variables Xi,x 2 are connected to the constraint Ci. Now, x 2 has the other neighboring 
constraint, saying C 2 . As the codeword c must satisfy the constraint C 2 , there is at least one 
Xj connecting C 2 for some 1 < j < d, j ^ 2. If j — 1 then we get a cycle of length 4 in G, so 
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g = 2 < d and the proof is finished. Otherwise, assume £3 connects ci- If £3 also connects 
Ci, then the proof is finished as the same as the previous. Otherwise, £3 connects the other 
constraint. Then iterate the same procedure. One can finally get a cycle of length < 2d in 
the graph G. So the girth of the reduced graph G re d is g < d. 

In conclusion, we have proved the minimum distance d = g. 

□ 

The theorem extends the result of [[131 Proposition 2], For non-binary case, Chen et al. [ 6 ] 
proposed how to enlarge the minimum distance by choosing proper non-zero elements at the 
non-zero locations of H. By Theorem 15. 11 in order to construct a binary LRC with rate ^j-, 
locality r, availability 2 , and minimum distance as large as possible, we need to construct 
an (r + l)-regular graph with girth g as large as possible. This latter problem of graph 
construction has been extensively studied in extremal graph theory. 

Let g(m, r ) denote the largest possible girth of an (r + l)-regular graph of size at most m, 
then for fixed r and asymptotically growing m we have 

4 

(5.1) (- - o(l)) log r m < g(m, r) < (2 + o(l)) log r m. 

o 

The second inequality in (15. ip is a version of the Moore bound [3, Theorem III. 1]. Note that 
the Moore bound is not achievable in most cases. The girths of random Cayley graphs are 
tested in [12]. The first explicit construction can be found in [22] for graphs with degree 4 
and large girth > 0.831og r m and those with arbitrary degree and large girth > 0.341og r m, 
the latter of which was later improved in na to > 0.48 log r m. Erdos and Sachs [7] described 
a simple procedure yielding families of graphs with large girth log r m. Examples of graphs 
with arbitrary degree and large girth > |log r m are given in [2|l20 |l23l 1231133] . Using these 
explicit constructions, we can obtain 

Theorem 5.2. Let G re d be an (r + 1 )-regular graph with n edges and girth g = O(logn). 
Extend the graph G re d to a bipartite graph G with regularity 2 and r + 1. The null space of 
the bi-adjacent matrix H of G defines our binary linear code C. Then the code C has length 
n, dimension > + 1, minimum distance 0 (log?r), locality r, and availability 2 . 

Comparing with the constructions in [30,32], our codes have a slight decline of rate, 
however, our codes have much larger minimum distances. Comparing with the construction 
of |3T, Theorem 3.1], the minimum distances of their codes are very large apparently. On 
one hand, their construction relies highly on the size of the finite held, so their method can 
not be employed for the binary case. On the other hand, if their code rate achieves ^j-, the 
minimum distance of their code degenerates to 1. Actually, the minimum distance O(logro) 
in the above theorem is already optimal in the case t = 2 by HD Theorem 2.5]. 

Remark 5.1. Analogously to the performance of random linear codes, for general locality r 
and availability t > 3, the codes constructed from random (r + l,t)-regular bipartite graphs 
have minimum distances with growing rate linearly to the length of the code with very high 
probability m Theorem 2.f]. Within our knowledge, there is no deterministic construction 
for (r + l, t)-regular bipartite graphs (arbitrary r and t) such that the corresponding codes 
have non-zero relative minimum distance 3 asymptotically. 
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6. CONCLUSIONS 


In the first part of this paper we studied the open problem in [40]: when ri\ < n 2 , what 
is the largest possible minimum distance for an [n, k] LRC? How to construct an [n, k] LRC 
with the largest possible minimum distance? For the first problem, we solve the linear integer 
programming in the case rii < n 2 and derive a new upper bound which is always better than 
the classic bound (II.ip . For the second problem, we find out that the construction of Tamo 
and Barg [37] is actually optimal when n\ < n 2 and u + v > r, v 7 ^ r. Using another 
interpolation polynomial, we present a construction of optimal LRCs when n\ < n 2 and 
u + v + n 2 — n\ < r. 

In the second part of this paper, we presented a graphical model for binary LRC with any 
locality and any availability. I 11 particular, for any locality and availability 2, we use the 
deep results from extremal graph theory to give a code construction which produces good 
LRCs in the sense that these codes satisfy the locality and availability request and they have 
high code rates and large (indeed optimal) minimum distances. 
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